1 Tutorial goals and set up

The goal of this tutorial is to explore how to fit hidden Markov models (HMMs) to movement data. To do so, we will investigate the R package momentuHMM. This package builds on a slightly older package, moveHMM, that was developed by Théo Michelot, Roland Langrock, and Toby Patterson, see associated paper: https://besjournals.onlinelibrary.wiley.com/doi/full/10.1111/2041-210X.12578. momentuHMM, was developed by Brett McClintock and Théo Michelot. momentuHMM has new features such as allowing for more data streams, inclusion of covariates as raster layers, and much more, see associated paper: https://besjournals.onlinelibrary.wiley.com/doi/abs/10.1111/2041-210X.12995.

The primary learning objectives in this first tutorial are to:

  1. Select an appropriate resolution for the data
  2. Handle missing locations
  3. Fit simple HMMs using momentuHMM
  4. Checking model fit
  5. Determining number of states to use
  6. Incorporating and interpreting covariates on behaviour transition probabilities
  7. Incorporating covariates in the observation model

1.1 Setup and data description

First, let’s load the packages that we will need to complete the analyses. Of course you need to have them installed first.

While the raster package is being phased out in favour of the new terra package, momentuHMM still relies on the raster package to extract covariates. Therefore, for this tutorial we will still use raster, but do start working with terra in your work where possible. Make sure to set working directory to “CANSSI_OTN_HMM_2023” of the HMM workshop folder:

setwd("Morning_Tutorial")

HMMs assume that locations are taken at regular time steps and that there is negligible position error. For example, HMMs are adequate to use on GPS tags that take locations at a set temporal frequency (e.g., every \(2\) hrs). Without reprocessing, HMMs are not suitable for irregular times series of locations or locations with large measurement error (e.g., you would need to preprocess Argos data before applying HMMs).

Unfortunately, movement of aquatic species is rarely taken at regular time intervals and without measurement error. The data set we will work on, is the movement track of three narwhals tagged with a Fastlock GPS tag for two weeks, provided by Dr. Marianne Marcoux.For simplicity, we only examine the GPS data from two weeks in August 2017 for three individuals. We only use the fastloc GPS data, so we don’t have to deal with location error.

2 Import data and initial data processing

First, let’s import the narwhal movement data and convert the time column to an appropriate date format.

tracks_gps <- read.csv("data/tracks_gps.csv")%>%
  mutate(time = ymd_hms(time))

The data we obtain is often quite messy with some records missing information and other records duplicated. We can filter records to keep only complete location data using !is.na(x) & !is.na(y). To remove duplicate records (same time, location, and location class), we will use the lag function from dplyr, which will use the value from the previous row so that we can compare to the current row.

tracks_gps <- tracks_gps %>% 
  # remove missing locations
  filter(!is.na(x) & !is.na(y),
         # remove identical records
         !(time == lag(time) & x == lag(x) & y == lag(y) & loc_class == lag(loc_class)))

Now we will plot the tracks with a new package ggOceanMaps that will show the tracks on the map of the Earth, with limits set in Longitude and Latitude of our data (bboxbelow). This package is convenient since it deals with Longitude and Latitude, and there is no need to transform our dataset into an sf object. In the crs argument, we define the coordinate reference system of the original data (in this case WGS84, which is defined by the EPSG code 4326).

#Transform the dataset into a data.frame with longitude, latitude and ID of the animal
df <-  cbind.data.frame(Lon = c(tracks_gps$x),Lat = c(tracks_gps$y), ID = c(tracks_gps$ID))

#Define the geographical limits (box) of the background map: c(min_lon,max_lon,min_lat,max_lat)
bbox <-c(min(tracks_gps$x),max(tracks_gps$x),min(tracks_gps$y),max(tracks_gps$y))

#Plot the data, with background map
basemap(limits = c(bbox), bathymetry = TRUE) + 
  geom_spatial_path(data=df, aes(x = Lon, y = Lat,colour=factor(ID)), crs = 4326)+
  ggtitle("Movement data of the narwhals")

3 Selecting a time interval for the HMM

An HMM assumes the observations are collected in discrete time and that there is no missing data in the predictor variables. When the data is irregular, there are two key decisions we must make, (1) the temporal resolution to use, and (2) how to address data gaps. The desired resolution depends predominantly on the biological question you are asking, as different behaviours and biological processes occur at different spatial and temporal scales (e.g., seasonal migration, daily movement between foraging and resting grounds, and fine scale foraging decisions). Generally, higher resolution data is preferred as it has more information. However, it is possible to have too-high of a resolution wherein information from fine-scale variability drowns out the signal from coarse-scale patterns of interest (e.g., seasonal migration). In this case, we will be linking the movement data with high resolution (75 s) dive data to look at finer-scale behaviours (on the order of a few hours). My rule of thumb, is that you want 3-50 data points per behaviour. For behaviours spanning several hours, that roughly corresponds to a desired resolution between 2 min and 60 min.

First, let’s calculate the time difference between successive records using difftime and lead (compares current row to following row) and place these values in a new column called dt. Note that the time difference is in minutes (units = "mins"). For the last record of each individual (i.e., when ID != lead(ID)), we will set the value to NA.

# Calculate time difference between locations
tracks_gps <- tracks_gps %>%
  mutate(dt = ifelse(ID == lead(ID), # If next data row is same individual
                     # calculate time difference
                     difftime(lead(time), time, units = "mins"), NA))

Let’s see what resolutions may be possible in the data by looking at the most frequent time gaps.

# Visualise time differences
hist(tracks_gps$dt, 1000, main = NA, xlab = "Time difference (min)")

# Zoom in on short intervals
hist(tracks_gps$dt, 1000, main = NA, xlab = "Time difference (min)", xlim = c(0,100))

# identify the most frequent dt
tracks_gps %>% 
  {table(.$dt)} %>% 
  sort(decreasing = TRUE) %>% 
  head()
## 
##  10  11  12  22   9  13 
## 400 145  90  64  63  63

We see that the most frequent time gap is \(10\) min, followed by \(11\), \(12\), \(22\), \(9\) and \(13\) min. We also see the majority of the gaps are \(< 60\) min, however some are in excess of \(600\) min. Because HMMs assume observations taken at regular time intervals, finer resolutions will contain more data gaps that would need to be interpolated. Frequent and large data gaps can be difficult to handle, especially as the number of missing data points approaches or exceeds the existing data. Let’s examine the potential data structure at different resolutions for the different animals.

We first create a function that approximates the proportion of missing locations we would have for a given resolution.

# Make function to estimate proportion of missing locations 
p_na <- function(time, res) {
  time <- round_date(time, paste(res,"min")) # round to nearest resolution
  time <- na.omit(time[time != lead(time)]) # remove duplicate time
  # calculate maximum number of locations          
  max_n_loc <- length(seq(time[1], tail(time, 1) + res*60, 
                          by = as.difftime(res, units = "mins")))
  n_NA <- max_n_loc - (length(time)+1)
  n_NA / max_n_loc
}

We can now use this function to look at the proportion of NAs we would get with 10, 20, 30, and 60 min resolution.

# summarise track dt
tracks_gps %>% 
  group_by(ID) %>% 
  summarise(p_NA_10m = p_na(time, 10),     # 10 min 
            p_NA_20m = p_na(time, 20),     # 20 min 
            p_NA_30m = p_na(time, 30),     # 30 min 
            p_NA_60m = p_na(time, 60)) %>% # 60 min
  # return formatted table
  kable(digits = 3, col.names = c("Narwhal ID", paste(c(10,20,30,60), "m"))) %>%
  kable_styling() %>% 
  add_header_above(c("", "Resolution" = 4))
Resolution
Narwhal ID 10 m 20 m 30 m 60 m
T172062 0.486 0.236 0.191 0.152
T172064 0.502 0.203 0.120 0.074
T172066 0.488 0.230 0.185 0.160

Here we see that the \(10\) min interval, around \(50\%\) of the locations would be missing. This is a lot, but if we choose finer resolutions, simulated data would outweigh real data, and may bias the results. For this tutorial, we will use a \(10\) min resolution.

4 Handling data gaps

There are several ways to deal with data gaps, and we will address two:

  1. Interpolation (correlated random walk)
  2. Voiding data gaps

4.1 Interpolation (correlated random walk)

For large datasets with few and small gaps, the simplest approach is to use linear interpolation between missing times. A slightly better way to interpolate locations is to fit a continuous-time correlated random walk (CTCRW) model to the data and predict the most likely locations. momentuHMM contains wrapper functions to interpolate missing locations by fitting a CTCRW to the data based on the crawl package by Devin Johnson and Josh London. There are many options to fit the CTCRW model, and a detailed tutorial for analysis with crawl is available here: https://jmlondon.github.io/crawl-workshop/crawl-practical.html. Let’s try to fit the most basic model using the wrapper function crawlWrap. In the most basic form, we only need to provide tracking data with the columns ID, time, x, and y and specify the desired temporal resolution.

First, let us transform the data into an sf object. crawlWrap can also take a data.frame as an argument but that would imply renaming some of our columns. It is easier to just transform the data into an sf object.

tracks_gps_sf <- tracks_gps %>%
  st_as_sf(coords = c("x", "y")) %>% # converts to an sf object
  st_set_crs(4326) %>% # define CRS
  st_transform(2962) # reproject data to a UTM
# crawl can fail to fit periodically, so I recommend always setting a seed 
set.seed(12)

# fit crawl
crw_gps_10 <- crawlWrap(obsData = tracks_gps_sf, timeStep = "10 mins")
## Fitting 3 track(s) using crawl::crwMLE...
## Individual T172062...
## Individual T172064...
## Individual T172066...
## DONE
## 
## Predicting locations (and uncertainty) at 10 mins time steps for 3 track(s) using crawl::crwPredict... DONE

# view that all parameters were properly estimated 
crw_gps_10$crwFits %>% 
  lapply(function(x){
    x[c("par","se","ci")] %>%  # get estimated values 
      unlist() %>%  # unlist
      is.nan() %>%  # identify values that failed to estimate
      sum() == 0 # count failed estimates and ensure there are 0
  }) 
## $T172062
## [1] TRUE
## 
## $T172064
## [1] TRUE
## 
## $T172066
## [1] TRUE

Notice how the predicted tracks do not make perfectly straight lines through missing sections (particularly noticeable in T172062).

For later stages in this tutorial, I want to use the tracks predicted using this CRW method, so we will extract the predicted locations and add them to the observed data.

# filter predicted locations from predicted CRW model
tracks_gps_crw <- data.frame(crw_gps_10$crwPredict) %>% 
  filter(locType == "p") %>% 
  dplyr::select(mu.x, mu.y, time,
                ID) %>% 
  dplyr::rename(x = "mu.x", y = "mu.y")

It is now possible to now to fit an HMM on the tracks_gps_crw dataset. We will see later in this tutorial how to do so.

4.2 Voiding data gaps

One strategy to address data gaps is to leave the data streams (i.e., step length and turning angle) as NAs. For my own work, I have typically voided sections with \(>6\) missing locations. The maximum size of a gap to allow depends on the frequency of the missing data, frequency of locations, study species, and behaviours of interest.

Voiding gaps is particularly useful for moderate and large gaps, however, it may often be the best option for small gaps as well. The only catch is that although HMMs can allow for missing data in the state-dependent observation data (e.g., step length and turning angle), HMMs cannot have missing data if there are covariates on the transition probability (for example, if there was an effect of bathymetry on state probability).

In this case, we will void the data streams from locations predicting in gaps \(>60\) min. First, we will identify which steps need to be voided, then we will prepare the data and void the estimated step and angle data streams. We will do this again later in the tutorial, so we will wrap it into a function called prepData_NAGaps. The function will output a momentuHMMData object that can directly be fit using fitHMM.

This methods will not work on data with multiple individuals. To get around this, we can split the data into a list where each element of the list is the location data for each ID. We can use the R apply family functions to apply functions to each animal separately.

# convert tracks back to data.frame with xy coordinates
tracks_gps_ls <- tracks_gps %>% 
  split(., .$ID)  # split into list
# define function to replace step and turn angle of large gaps with NA
NA_gaps <- function(tracks, times){
  # rows where location is within a large gap
  rows <- which(
    rowSums(apply(times, 1, function(X, tracks){
      dplyr::between(tracks, 
                     as.POSIXct(X[1], tz = "UTC"),
                     as.POSIXct(X[2], tz = "UTC"))
    }, tracks$time))==1)
  tracks$step[rows] <- NA
  tracks$angle[rows] <- NA
  return(tracks)
}
# define function to identify and void gaps
prepData_NAGaps <- function(track_list, tracks_crw, res, max_gap, ...){
  # for each ID, identify which rows have gaps >= max_gap 
  gaps_ls_rows <- lapply(track_list, function(x){
    which(difftime(lead(x$time), x$time, units = "mins") >= max_gap)
  })
  
  # create sequence of times for each track from gaps >= 60 min
  gap_ls <- mapply(FUN = function(track, gaps){
    # identify start and end date of each gap
    gaps_ls_srt_end <- list(start = ceiling_date(track$time[gaps], paste(res, "min")),
                            end = floor_date(track$time[gaps+1], paste(res, "min")))
    # combine into single vector for each track
    data.frame(start = gaps_ls_srt_end$start, end = gaps_ls_srt_end$end)
  },
  track_list, gaps_ls_rows, SIMPLIFY = FALSE)
  
  # prep data and list by ID
  prep_tracks <- prepData(tracks_crw, ...) %>% 
    {split(., .$ID)}
  
  # Interpolate the location at the times from the sequence
  mapply(FUN = NA_gaps, prep_tracks, gap_ls,
         SIMPLIFY = FALSE) %>% 
    do.call("rbind", .) # convert list of tracks back to a single data.frame
}

prep_tracks_gps_crw_NAgaps <- prepData_NAGaps(tracks_gps_ls, tracks_gps_crw, 
                                              10, 60, type = "UTM")

# get time of day covariate (which will be used later)
prep_tracks_gps_crw_NAgaps$tod <- hour(prep_tracks_gps_crw_NAgaps$time) + minute(prep_tracks_gps_crw_NAgaps$time)/60

head(prep_tracks_gps_crw_NAgaps)
##                ID     step     angle                time         x       y
## T172062.1 T172062 253.5446        NA 2017-08-08 00:20:00 -285113.8 8176359
## T172062.2 T172062 231.0461 0.7415044 2017-08-08 00:30:00 -285315.1 8176513
## T172062.3 T172062 451.1109 0.8454366 2017-08-08 00:40:00 -285545.3 8176493
## T172062.4 T172062 689.7662 0.1654594 2017-08-08 00:50:00 -285813.7 8176130
## T172062.5 T172062 506.6135 0.2621016 2017-08-08 01:00:00 -286127.2 8175516
## T172062.6 T172062 241.1625 1.0919775 2017-08-08 01:10:00 -286232.7 8175020
##                 tod
## T172062.1 0.3333333
## T172062.2 0.5000000
## T172062.3 0.6666667
## T172062.4 0.8333333
## T172062.5 1.0000000
## T172062.6 1.1666667

The main idea is to use tracks_gps_ls to locate the large gaps with the NA_gaps function and then fill them with NAs in the tracks_gps_crw dataset. We are using the tracks_gps_crw dataset because smaller gaps are filled and the time-resolution is already set.

Now, it is possible to fit an HMM on prep_tracks_gps_crw_NAgaps.

Another strategy to deal with larger gaps is to segment the tracks with a new individual ID. This may be particularly appropriate for gaps where we may reasonably expect that the underlying states are effectively independent of one another. Specifically, we may ask, over what period of time does the behaviour of the animal affect the subsequent behaviour. In this case, we may expect that the behaviour of a narwhal depends on the behaviour over the proceeding several hours, however is independent after 24 hours. We can split the tracks for gaps larger than a predetermined threshold by iterating the ID column. For each segmented path, it is then possible to void the large gaps and fit a correlated random walks on smaller gaps, as we did before.

In the next part of this tutorial, we will explore how to fit hidden Markov models (HMMs) to the prep_tracks_gps_crw_NAgaps dataset in the R package momentuHMM.

5 Fit HMM to data

The primary learning objectives of this part are:

  1. Fit simple HMMs using momentuHMM
  2. Checking model fit
  3. Determining number of states to use

5.1 Fitting the Model

Our data is ready to use, and are almost ready to fit the HMM (via the function fitHMM). This is where we need to make many of our modelling decisions and most of these decisions will be associated with one of the arguments of the function. The minimum arguments fitHMM requires are: fitHMM(data, nbState, dist, Par0).

  1. When we fit a HMM, we need to decide the number of behavioural states we are going to model. To start simple, we will only use two behavioural states. These could be, for example, representing one behaviour with fast and directed movement (e.g., travelling) and one behaviour with a more slow and tortuous movement (e.g., residing). This mean that the argument nbStates will be set to \(2\).

  2. We need to select the distribution we will use to model the step lengths and the one we will use to model the turning angles. For now, we will use the gamma distribution for the step length and the von Mises for the turning angles. These are commonly used distributions, to model movement data.This means that the argument dist will be set to: list(step=“gamma”, angle=“vm”). Note that dist should be a list with an item for each data stream columns in the data that we are modelling (so here the column step and angle). The gamma distribution is strictly positive (i.e., it does not allow for \(0\)s). If you have step lengths that are exactly zero in your data set, you need to use zero-inflated distributions. But in this case, we have no zeros.

  3. We need to decide whether we make the transition probabilities between the behavioural states dependent on covariates. Here, we will start simple and we will not use covariates.

  4. We need to select initial values for the parameters to estimate. The HMMs are fit using maximum likelihood estimate (MLE) with a numerical optimizer. An unfortunate aspect of fitting models using numerical optimizers, is that, to be able to explore the parameter space and find the best parameter values for the model, the optimizer needs to start somewhere. You need to decide where it starts. Unfortunately, choosing bad starting values can result in parameter estimates that are not the MLE, but just a local maximum. To choose your initial parameter you can take a quick peak at the data (e.g., using the plots below) and use some general biological information. For example, it’s common for animals to have one behaviour with long step lengths and small turning angles and one behaviour with short step lengths and larger turning angles. From the plots below (note that we can simply use plot on our data), it looks like the animal has step lengths that are close to \(100\) and \(600\). There could be a third behaviour with larger step lengths. But for now we will consider only two.

Note, by default, fitHMM will set the argument estAngleMean to NULL, which means that we assume that the mean angle is \(0\) for both behaviours (i.e., the animal has a tendency to continue in the same direction) and that only the angle concentration parameters differ. The concentration parameters control the extent to which the animal continues forward versus turn. Doing so reduces the number of parameters to estimate. These are all very important decisions that you must make when you construct your model.

plot(prep_tracks_gps_crw_NAgaps)

plot(prep_tracks_gps_crw_NAgaps$step ~
       prep_tracks_gps_crw_NAgaps$time, ty = "l", ylab = "Step length",
     xlab = "Date", las = 1)
abline(h = 100, col = rgb(1, 0, 0, 0.5))
abline(h = 600, col = rgb(1, 0, 0, 0.5))

Based on visual examination of the data, we might choose our initial paramters to be \(100\) and \(600\) for the mean step lengths and use half those values for the standard deviations. The turning angles are either very close to \(0\) or pretty spread from \(-\pi/2\) to \(\pi/2\). High concentration parameter values (\(\kappa\), said kappa) mean that the animal has a tendency to move in the same direction. Values close to 0 mean that the turning angle distribution is almost uniform (the animal turns in all directions). Note that \(\kappa\) cannot be exactly 0, so let’s choose 0.1 and 1.

# define states (optional)
stateNames <- c("resident", "travel")
nbState <- length(stateNames)
# define distribution to use for each data stream
dist <- list(step = "gamma", angle = "vm")

# Setting up the starting values
mu0 <- c(100, 600) # Mean step length
sigma0 <- mu0/2 # SD of the step length
kappa0 <- c(0.1, 1) # Turn angle concentration parameter
# combine starting parameters 
Par0 <- list(step = c(mu0, sigma0), angle = kappa0)

Ok, we are ready. Let’s fit the HMM and look at the parameter estimates

# Fit a 2 state HMM
HMM_gps_crw_NAgaps <- fitHMM(prep_tracks_gps_crw_NAgaps, 
                             stateNames = stateNames, nbState = 2, 
                             dist = dist, Par0 = Par0)

# Let's look at parameter estimates
HMM_gps_crw_NAgaps
## Value of the maximum log-likelihood: -17884.91 
## 
## 
## step parameters:
## ----------------
##      resident   travel
## mean 282.0578 754.6505
## sd   153.8772 251.0847
## 
## angle parameters:
## -----------------
##               resident   travel
## mean          0.000000  0.00000
## concentration 2.277153 18.01728
## 
## Regression coeffs for the transition probabilities:
## ---------------------------------------------------
##                1 -> 2    2 -> 1
## (Intercept) -2.601642 -2.563677
## 
## Transition probability matrix:
## ------------------------------
##            resident     travel
## resident 0.93096716 0.06903284
## travel   0.07151298 0.92848702
## 
## Initial distribution:
## ---------------------
##  resident    travel 
## 0.3025479 0.6974521

We can also use the plot function to plot the outputs. This will show us the estimated state-dependent distributions, and estimated states for each location.

# plot outputs (for one animal)
plot(HMM_gps_crw_NAgaps, ask = FALSE)
## Decoding state sequence... DONE

Based on the mean step length parameters, it looks like the first behavioural state (resident) has smaller step lengths compared to state 2. This is particularly easy to see in the step histogram (first figure), where the estimated distribution for each state is overlaid on top of the observed step length frequencies. The turning angle distributions of second state (travel) indicates much more directed movement, with higher angular concentration. Looking at the track we can see the estimated transitions between states.

5.2 Identifying Behavioural States

We are interested in identifying when an animal is in each of the behavioural states (i.e., when the animal is in state 1 vs state 2), something we call state decoding. We can use the function viterbi, which uses the Viterbi algorithm to produce the most likely sequence of states according to your fit model and data.

# Apply the Viterbi algorithm using your fited model object
dec_States <- viterbi(HMM_gps_crw_NAgaps)
# Let's look at predicted states of the first 20 time steps
head(dec_States, 20)
##  [1] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2
# How many locations in each state do we have?
table(dec_States)
## dec_States
##    1    2 
## 1476 1446

In many cases, it is more interesting to get the probability of being in each state rather than the most likely state. To do so, you can use the function stateProbs, which returns a matrix of the probability of being in each state for each time step.

# Calculate the probability of being in each state
statep <- stateProbs(HMM_gps_crw_NAgaps)
# Let's look at the state probability matrix
head(statep)
##       resident       travel
## [1,] 0.9944545 5.545466e-03
## [2,] 0.9999782 2.183931e-05
## [3,] 0.9948147 5.185281e-03
## [4,] 0.6298960 3.701040e-01
## [5,] 0.7077568 2.922432e-01
## [6,] 0.9999950 4.965048e-06

We can see here the probability of being in both states for the first \(6\) time steps. Here, the probability of being in state 1 is really high for these steps, but that may not always be the case. Sometimes you might have values closer to \(0.5\), which would indicate that for that time step, it’s hard to identify which state you are in (i.e., which step length and turning angle distributions fit best).

You can plot the entire sequence from both of these functions using the following code:

plotStates(HMM_gps_crw_NAgaps)

5.3 Checking the Model

The first thing we need to look into is whether the parameter estimates are reliable (i.e., if we are finding the global maximum). As mentioned above, the initial parameter values can affect the estimating procedures. So it’s always good to check if you get similar results with different starting values. Here, we will make the starting values of the two behavioural states closer to one another.

# Setting up the starting values
mu2 <- c(400, 600) # Mean step length
sigma2 <- mu2/2 # SD of the step length
kappa2 <- c(1, 1) # Turn angle concentration parameter
# combine starting parameters 
Par2 <- list(step = c(mu2, sigma2), angle = kappa2)

# Fit the same 2 state HMM
HMM_gps_crw_NAgaps2 <- fitHMM(prep_tracks_gps_crw_NAgaps, 
                              stateNames = stateNames, nbState = 2, 
                              dist = dist, Par0 = Par0)

Let’s compare the two results. First let’s look at which of the two has the lowest negative log likelihood (equivalent of highest log likelihood, so closer to the real MLE). Let’s look also at the parameter estimates they each returned.

# Negative log likelihood
c(original = HMM_gps_crw_NAgaps$mod$minimum, new = HMM_gps_crw_NAgaps2$mod$minimum)
## original      new 
## 17884.91 17884.91
# Parameter estimates
cbind(HMM_gps_crw_NAgaps$mle$step, HMM_gps_crw_NAgaps2$mle$step)
##      resident   travel resident   travel
## mean 282.0578 754.6505 282.0578 754.6505
## sd   153.8772 251.0847 153.8772 251.0847
cbind(HMM_gps_crw_NAgaps$mle$angle, HMM_gps_crw_NAgaps2$mle$angle)
##               resident   travel resident   travel
## mean          0.000000  0.00000 0.000000  0.00000
## concentration 2.277153 18.01728 2.277153 18.01728
cbind(HMM_gps_crw_NAgaps$mle$gamma, HMM_gps_crw_NAgaps2$mle$gamma)
##            resident     travel   resident     travel
## resident 0.93096716 0.06903284 0.93096716 0.06903284
## travel   0.07151298 0.92848702 0.07151298 0.92848702

Looks like they both returned very close values for everything. So that’s good! The function fitHMM also has the argument retryFits which perturbs the parameter estimates and retries fitting the model. We use it when we think the estimates could be local maxima. Since the optimization of HMMs depend on the starting values chosen, it is always good to explore multiple initial values and then choose the best estimates among them (which is done by retryFits). The argument is used to indicate the number of times you want perturb the parameters and retry fitting the model (you can choose the size of the perturbation by setting the argument retrySD).

Let’s try (this will take a few minutes).

# Fit the same 2-state HMM with retryFits
# This is a random pertubation, so setting the seed to get the same results
set.seed(29)
HMM_RF <- fitHMM(prep_tracks_gps_crw_NAgaps,
                 nbState = 2,
                 dist=list(step="gamma", angle="vm"),
                 Par0 = list(step=c(mu2, sigma2), angle=kappa2),
                 retryFits=10)
## Attempting to improve fit using random perturbation. Press 'esc' to force exit from 'fitHMM'
## 
    Attempt 1 of 10 -- current log-likelihood value: -17884.91         
    Attempt 2 of 10 -- current log-likelihood value: -17884.91         
    Attempt 3 of 10 -- current log-likelihood value: -17884.91         
    Attempt 4 of 10 -- current log-likelihood value: -17884.91         
    Attempt 5 of 10 -- current log-likelihood value: -17884.91         
    Attempt 6 of 10 -- current log-likelihood value: -17884.91         
    Attempt 7 of 10 -- current log-likelihood value: -17884.91         
    Attempt 8 of 10 -- current log-likelihood value: -17884.91         
    Attempt 9 of 10 -- current log-likelihood value: -17884.91         
    Attempt 10 of 10 -- current log-likelihood value: -17884.91

Let’s compare the results again.

# Negative log likelihood
c(original = HMM_gps_crw_NAgaps$mod$minimum, new = HMM_gps_crw_NAgaps2$mod$minimum,
  retryFits = HMM_RF$mod$minimum)
##  original       new retryFits 
##  17884.91  17884.91  17884.91
# Parameter estimates
cbind(HMM_gps_crw_NAgaps$mle$step, HMM_gps_crw_NAgaps2$mle$step, HMM_RF$mle$step)
##      resident   travel resident   travel  state 1  state 2
## mean 282.0578 754.6505 282.0578 754.6505 282.0581 754.6507
## sd   153.8772 251.0847 153.8772 251.0847 153.8775 251.0848
cbind(HMM_gps_crw_NAgaps$mle$angle, HMM_gps_crw_NAgaps2$mle$angle, HMM_RF$mle$angle)
##               resident   travel resident   travel state 1  state 2
## mean          0.000000  0.00000 0.000000  0.00000 0.00000  0.00000
## concentration 2.277153 18.01728 2.277153 18.01728 2.27715 18.01732
cbind(HMM_gps_crw_NAgaps$mle$gamma, HMM_gps_crw_NAgaps2$mle$gamma, HMM_RF$mle$gamma)
##            resident     travel   resident     travel    state 1    state 2
## resident 0.93096716 0.06903284 0.93096716 0.06903284 0.93096679 0.06903321
## travel   0.07151298 0.92848702 0.07151298 0.92848702 0.07151399 0.92848601

Still looking very similar!

momentuHMM has functions to help selecting initial parameters for complex models. However, these functions rely on the user choosing initial parameter values for a simple model like the one we just explored.

5.4 Pseudo-residuals

We fitted a model to the data and it looks like the parameter estimates are reliable, but is this a good model for our data? Can we use the model results? The best way to investigate model fit is through pseudo-residuals. Pseudo-residuals are a type of model residuals that account for the interdependence of observations. They are calculated for each of the time series (e.g., you will have pseudo-residuals for the step length time series and for the turning angle time series). If the fit model is appropriate, the pseudo-residuals produced by the functions pseudoRes should follow a standard normal distribution. You can look at pseudo-residuals directly via the function plotPR, which plots the pseudo-residual times-series, the qq-plots, and the autocorrelation functions (ACF).

plotPR(HMM_gps_crw_NAgaps)
## Computing pseudo-residuals...

Both the qq-plot and the ACF plot indicate that the model does not fit the step length time series particularly well. The ACF indicates that there is severe autocorrelation remaining in the time series, even after accounting for the persistence in the underlying behavioural states.

This could indicate that there are more hidden behavioural states. Let’s try a 3-state HMMs.

# Setting up the starting values
mu3 <- c(100, 600, 1000) # Mean step length
sigma3 <- mu3/2 # SD of the step length
kappa3 <- c(0.1, 1, 1) # Turn angle concentration parameter
# combine starting parameters 
Par3 <- list(step = c(mu3, sigma3), angle = kappa3)

# Fit the same 3 state HMM
HMM_gps_crw_NAgaps3 <- fitHMM(prep_tracks_gps_crw_NAgaps, 
                              nbState = 3, 
                              dist = dist, Par0 = Par3)

plot(HMM_gps_crw_NAgaps3)
## Decoding state sequence... DONE

If we look at the pseudo-residuals

plotPR(HMM_gps_crw_NAgaps3)
## Computing pseudo-residuals...

It’s better. Not perfect, but less unexplained autocorrelation, especially in the step lengths. If we look at the step length distribution, we can see that we have still our state with really small steps (maybe resting?), a state with steps of medium size (maybe foraging?), and a state with large steps (maybe travelling?). Looking at the track, it does look like the movement in state 2 is more tortuous and in focal areas, while the movement in state \(3\) is very directed and span larger areas.

6 Adding covariates to the model

We can add covariates to our HMM to better understand drivers of behaviour. We can either include covariates on the transition probabilities (i.e., the state process) or in the state-dependent distributions (i.e., the observation process). Respectively, these answer the questions:

6.1 Covariates on transition probabilities

It’s probably most common to include covariates on the transition probabilities in an HMM to assess how spatial, temporal, or demographic factors influence the state-switching dynamics. For example, prey density may increase the probability of switching into a foraging state or states may follow temporal patterns throughout the day. At time \(t\), each transition probability is linked to covariates using a multinomial logit link \[\gamma_{ij}^{(t)} = Pr(S_{t+1} = j \mid S_t = i) = \frac{\exp(\eta_{ij}^{(t)})}{\sum_{k=1}^{K}\exp(\eta_{ik}^{(t)})} \] with the linear predictor for \(P\) covariates \(\{\omega_1^{(t)}, \omega_2^{(t)}, \dots \omega_P^{(t)}\}\) given as \[\eta_{ij}^{(t)} = \begin{cases} \beta^{(ij)}_0 + \sum_{p=1}^{P} \beta_p^{(ij)} \omega_p^{(t)} & \text{if } i \neq j \\ 0 & \text{otherwise.} \end{cases} \] For each transition probability, \(\beta_0^{(ij)}\) is an intercept parameter, and \(\beta_p^{(ij)}\) measures the effect of the \(p\)-th covariate \(\omega_p\).

Here, we will assess if transition probabilties vary as a function of time of day, and we will use a 2-state model for demonstration.The first step is to define our model formula. Since time of day is a cyclic covariate, we must define a relationship that ensures that the transition probabiltiies are similar at the very end and very beginning of the day. We can do this by specify the transition probabilties as a trigonometric (i.e., cyclic) function of time of day (defined as \(\tau\)),

\[\eta_{ij}^{(t)} = \begin{cases} \beta^{(ij)}_0 + \beta_1^{(ij)} \cos \left(\frac{2 \pi \tau_t}{24}\right) + \beta_2^{(ij)} \sin \left(\frac{2 \pi \tau_t}{24}\right) & \text{if } i \neq j \\ 0 & \text{otherwise} \end{cases} ,\] where 24 refers for the 24 hour period of interest. This relationhip can conveniently be specified with the cosinor function.

# set new formula
formula <- ~cosinor(tod, period = 24)

Next, we need to specify initial parameters and we will use the function getPar0 to do so. This will use the estimated parameters of a previously fitted 2-state model with no covariates. Note it will set the unknown linear predictor parameters to 0, and use the intercept of the previous model.

# set tpm formula and extract initial values from simpler model
Par0_m2 <- getPar0(model = HMM_gps_crw_NAgaps, formula = formula)
Par0_m2
## $Par
## $Par$step
##   mean_1   mean_2     sd_1     sd_2 
## 282.0578 754.6505 153.8772 251.0847 
## 
## $Par$angle
## concentration_1 concentration_2 
##        2.277153       18.017285 
## 
## 
## $beta
##                                 1 -> 2    2 -> 1
## (Intercept)                  -2.601642 -2.563677
## cosinorCos(tod, period = 24)  0.000000  0.000000
## cosinorSin(tod, period = 24)  0.000000  0.000000
## 
## $delta
## [1] 0.3025479 0.6974521

Then we can fit the model. The formula argument defines the model formula for the transition probabilities, and we need to set the correct initial parameters (i.e., Par0 for the observation parameters and beta0 for the transition probability model).

# fit model
m_tod_tpm <- fitHMM(prep_tracks_gps_crw_NAgaps,
                    nbStates=2,
                    dist=dist,
                    Par0=Par0_m2$Par,
                    beta0=Par0_m2$beta,
                    formula=formula)
m_tod_tpm$mle$beta
##                                    1 -> 2     2 -> 1
## (Intercept)                  -2.569164871 -2.4871784
## cosinorCos(tod, period = 24)  0.182094271  0.2961311
## cosinorSin(tod, period = 24)  0.004107689 -0.1570752

There are several ways to visualise the relationship. If you use the standard plot() function on the fitted HMM, you can visualise the transition probabilities directly. Another option is to plot the stationary state probabilities, which represent how likely the animal is to be in each behaviour at each time of day.

plotStationary(m_tod_tpm, plotCI = TRUE)

6.2 Covariates in the observation model

This section will illustrate how to model mean step length as a function of time of day. Any parameter in the observation model (e.g., mean/sd of the step lengths, or turning angle parameters) can be modelled as a function of a covariate, which allows for the movement behaviour within each discrete state to vary. Here, we will use the same trigonometric function described in the previous section to model mean step length based on the time of day. To do so, we specify DM, which will be a list of formulas for the step length (and possibly other) prameters. For simplicity, we will only model the mean, but it is common to also model the standard deviation with the same relationship. We can derive the initial parameters using the getPar0 function, as described in the previous section.

set.seed(1)

# define step formula and design matrix
DM <- list(step = list(mean = formula, sd = ~1))

# extract initial values from simpler 2-state model with no covariates
Par0_m3 <- getPar0(model = HMM_gps_crw_NAgaps, DM = DM)

Now we can fit the HMM with covariate-dependent observation parameters.

# fit model
m_tod_obs <- fitHMM(prep_tracks_gps_crw_NAgaps, 
                    nbStates = 2, 
                    dist = dist, 
                    Par0 = Par0_m3$Par,
                    beta0 = Par0_m3$beta, 
                    DM = DM)
m_tod_obs
## Value of the maximum log-likelihood: -17849.32 
## 
## 
## Regression coeffs for step parameters:
## --------------------------------------
##      mean_1:(Intercept) mean_1:cosinorCos(tod, period = 24)
## [1,]           5.665086                         -0.02765113
##      mean_1:cosinorSin(tod, period = 24) mean_2:(Intercept)
## [1,]                           0.0438113           6.596116
##      mean_2:cosinorCos(tod, period = 24) mean_2:cosinorSin(tod, period = 24)
## [1,]                          -0.1421544                           0.1260971
##      sd_1:(Intercept) sd_2:(Intercept)
## [1,]         5.077695         5.467302
## 
## step parameters (based on mean covariate values):
## -------------------------------------------------
##       state 1  state 2
## mean 297.9917 854.4029
## sd   160.4039 236.8204
## 
## angle parameters:
## -----------------
##                state 1  state 2
## mean          0.000000  0.00000
## concentration 2.345858 18.60938
## 
## Regression coeffs for the transition probabilities:
## ---------------------------------------------------
##                1 -> 2    2 -> 1
## (Intercept) -2.673261 -2.586582
## 
## Transition probability matrix:
## ------------------------------
##            state 1    state 2
## state 1 0.93543027 0.06456973
## state 2 0.07000697 0.92999303
## 
## Initial distribution:
## ---------------------
##   state 1   state 2 
## 0.2959291 0.7040709

The regression coefficients are difficult to interpret for a cyclic function, but we can view the estimated relationship for each state with confidence intervals. Note, when the model has covariates, you can include plotCI = T to include the 95% confidence intervals on the estimated value.

plot(m_tod_obs, ask = F, plotCI = T, plotTracks = F)
## Decoding state sequence... DONE

This shows that the mean step within each state varies based on time of day, with higher step lengths in the morning. State 1 has a mean ranging from roughly 280-300m and state 2 ranges from roughly 600-900m.

As the observation data is the same between these models and the model in section 4.2, we can use AIC to compare models with and without time of day effects.

AIC(HMM_gps_crw_NAgaps)  # without time of day
## [1] 35787.81
AIC(m_tod_tpm)  # time of day effect on transition probabilities
## [1] 35792.69
AIC(m_tod_obs)  # time of day effect on observation parameters
## [1] 35724.65

This indicates the the model with a time-varying mean step length is the best fitting model.